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Abstract — For a state-dependent DMC with input alphabet X 
and state alphabet 5 where the i.i.d. state sequence is known 
causally at the transmitter, it is shown that by using at most 
lA" |5| —5+1 out of A'l''^ input symbols of the Shannon's 
associated channel, the capacity is achievable. As an example of 
state-dependent channels with side information at the transmitter, 
M-ary signal transmission over AWGN channel with additive 
Q-ary interference where the sequence of i.i.d. interference 
symbols is known causally at the transmitter is considered. 
For the special case where the Gaussian noise power is zero, 
a sufficient condition, which is independent of interference, is 
given for the capacity to be logj M bits per channel use. The 
problem of maximization of the transmission rate under the 
constraint that the channel input given any current interference 
symbol is uniformly distributed over the channel input alphabet 
is investigated. For this setting, the general structure of a 
communication system with optimal preceding is proposed. 

I. Introduction 

Information transmission over channels with known interfer- 
ence at the transmitter has received a great deal of attention. A 
remarkable result on such channels was obtained by Costa who 
showed that the capacity of the additive white Gaussian noise 
(AWGN) channel with additive Gaussian i.i.d. interference, 
where the sequence of interference symbols is known non- 
causally at the transmitter, is the same as the capacity of 
AWGN channel [1]. Therefore, the interference does not incur 
any loss in the capacity. This result was extended to arbitrary 
interference (random or deterministic) by Erez et al. [2]. The 
result obtained by Costa does not hold for the case that the 
sequence of interference symbols is known causally at the 
transmitter 

Channels with known interference at the transmitter are spe- 
cial case of channels with side information at the transmitter 
which were considered by Shannon [3] in causal knowledge 
setting and by Gel'fand and Pinsker [4] in non-causal knowl- 
edge setting. 

Shannon considered a discrete memoryless channel (DMC) 
whose transition matrix depends on the channel state. A state- 
dependent discrete memoryless channel (SD-DMC) is defined 
by a finite input alphabet X = {xi, . . . , a;|A'|}> ^ finite output 
alphabet y, and transition probabilities p{y\x,s), where the 
state s takes on values in a finite alphabet 5 = {1, . . . , |iS|}. 

' This work was supported by Nortel, the Natural Sciences and Engineering 
Research Council of Canada (NSERC), and the Ontario Centres of Excellence 
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Shannon [3] showed that the capacity of an SD-DMC where 
the i.i.d. state sequence is known causally at the encoder is 
equal to the capacity of an associated regular (without state) 
DMC with an extended input alphabet T and the same output 
alphabet y. The input alphabet of the associated channel is 
the set of all functions from the state alphabet to the input 
alphabet of the state-dependent channel. There are a total of 
of such functions, where |.| denotes the cardinality of 
a set. Any of the functions can be represented by a |iS|-tuple 



composed of elements of X, implying that 



the value of the function at state s is Xi 
The capacity is given by [3] 
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where the maximization is taken over the probability mass 
function (pmf) of the random variable T. 

In the capacity formula we can alternatively replace T 
with (Xi, . . . , where Xs is the random variable that 

represents the input to the state-dependent channel when the 
state is s, s = 1, . . . , 

This paper is organized as follows. In section |ll] we 
derive an upper bound on the cardinality of the Shannon's 
associated channel input alphabet to achieve the capacity. In 
section |III] we introduce our channel model. In section |IV] 
we investigate the capacity of the channel in the absence of 
noise. In section [V] we consider maximizing the transmission 
rate under the constraint that the channel input given any 
current interference symbol is uniformly distributed over the 
channel input alphabet. We present the general structure of a 
communication system for the channel with causally-known 
discrete interference in section |VI] We conclude this paper in 
section IVIII 



ii. a bound on the cardinality of the shannon's 
Associated Channel input alphabet 

We can obtain the pmf of the channel output Y as 

Pviv) = ^ps{s)pY\s{y\s) 



^Ps{s) ^ Px\six\s)pY\x,siy\x, s) 
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The capacity of the associated channel, which is the same 
as the capacity of the original state-dependent channel, is the 
maximum of I{T;Y) — I{XiX2 ■ ■ -^i^i;!^) over the joint 
pmf values Pi^i^-.-^si = M-^i = Xi^,---, X^g^ = x. 



C= max /(XiX2---X|5|;y). 

Pn'2--'|s| 



(3) 



The mutual information between T and Y is the difference 
between the entropies H{Y) and H{Y\T). It can be seen from 
(|2|l that pviy), and hence H{Y), are uniquely determined 
by the marginal pmfs {px^ (a^OIUi' ^ ~ 1: ■ ■ • i I'^l- The 
conditional entropy H{Y\T) is given by 

H{Y\T)=H{Y\XiX2---X\s\) 

\x\ \x\ 



(4) 



where hi^.-.i^^^ = H{Y\Xi = Xi 



There are \X\^^^ variables involved in the maximization 
problem (|3]l. Each variable represents the probability of an 
input symbol of the associated channel. The following theorem 
regards the number of nonzero variables required to achieve 
the maximum in Q. 

Theorem 1: The capacity of the associated channel is 
achieved by using at most |<^||5| — |iS| + 1 out of l^ll*^' input 
symbols with nonzero probabilities. 

Proof: Denote by = {PxAxi)}[=i the pmf 

of Xs, s = 1,2,...,|5|, induced by a capacity-achieving 
joint pmf - i|si=i' limit the search for a 

capacity-achieving joint pmf to those joint pmfs that yield the 

I A" I 

same mai-ginal pmfs as {Pii - i|s| Hi J..,i|5|^^- By limiting the 
search to this smaller set, the maximum of I{Xi ■ ■ ■ X\s^; Y) 
remains unchanged since the capacity-achieving joint pmf 
{Pii---iis\}ii i|si=i smaller set. But all joint pmfs 

in the smaller set yield the same H{Y) since they induce the 
same marginal pmfs on Xi^ . . . .X\s\- Therefore, the maxi- 
mization problem in (O reduces to the linear minimization 
problem 



\x\ \x\ 
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S. t. 



\x\ \x\ 

XI ■ ■ ■ XI = Pn^ ii = 1, . . . , \Xl 

*2 = 1 *|S|=1 



\x\ \x\ 

il = l i|S|-l=l 

> 0, zi, . . . = 1,2, . . . , lA:"]. (5) 

There are |<^||5| equality constraints in (|5]l out of which 
— |iS| + 1 are Unearly independent. From the theory 
of linear programming, the minimum of (|5]l, and hence the 



maximum of I{Xi ■ ■ ■ X\s\]Y), is achieved by a feasible 
solution with at most lA'HiSI — |iS| + 1 nonzero variables. ■ 
Theorem [U states that at most \X\\S\ - |iS| + 1 out of 
input symbols of the associated channel are needed to be used 
with positive probability to achieve the capacity. However, in 
general one does not know which of the inputs must be used 
to achieve the capacity. If we knew the marginal pmfs for 
Xi, . . . , X\s\ induced by a capacity-achieving joint pmf, we 
could obtain the capacity-achieving joint pmf itself by solving 
the linear program (|5]|. 

III. The Channel Model 
We consider data transmission over the channel 



Y = X + S + N, 



(6) 



where X is the channel input, which takes on values in a fixed 
real constellation 



X = {xi,X2, ■ ■ ■ ,Xm} ; 



(7) 



Y is the channel output, iV is additive white Gaussian noise 
with power P^v, and the interference 5 is a discrete random 
variable that takes on values in 



(8) 



with probabilities ri, r2, . . . , rg, respectively. The sequence of 
i.i.d. interference symbols is known causally at the encoder 
The above channel can be considered as a special case of 
state-dependent channels considered by Shannon with one 
exception, that the channel output alphabet is continuous. 
In our case, the likelihood function /y|x,s(z/N! is used 
instead of the transition probabilities. We denote the input to 
the associated channel by T, which can also be represented 
as (X\^X2^ ■ ■ ■ , Xq), where Xj is the random variable that 
represents the channel input when the current interference 
symbol is s^, j = 1, . . . , Q. 

The likelihood function for the associated channel is given 

by 

Q 

fY\T{y\t) = ^rjfY\x,s{y\Xz,,Sj) 

Q 

= ^^jfNiy - - Sj), (9) 

where /jv denotes the pdf of the noise N, and t is 
the input symbol of the associated channel represented by 

{Xii , X-i^ , ■ ■ ■ , X-Iq ). 

According to theorem [l] the capacity of our channel is 
obtained by using at most MQ — Q + 1 out of M'^ input 
symbols of the associated channel. 

IV. The Noise-Free Channel 

We consider a special case where the noise power is zero 
in In the absence of noise, the channel output Y takes on 
at most AIQ different values since different X and S pairs 
may yield the same sum. If Y takes on exactly MQ different 



values, then it is easy to see that the capacity is log2 M bits 
Q: The decoder just needs to partition the set of all possible 
channel output values into M subsets of size Q corresponding 
to M possible inputs, and decide that which subset the current 
received symbol belongs to. 

In general, where the cardinality of the channel output 
symbols can be less than M Q, we will show that under some 
condition on the channel input alphabet there exists a coding 
scheme that achieves the rate log2 M in one use of the channel. 
We do this by considering a one-shot coding scheme which 
uses only M (out of A/'^) inputs of the associated channel. 

In a one-shot coding scheme, a message is encoded to 
a single input of the associated channel. Any input of the 
associated channel can be represented by a Q-tuple composed 
of elements of X. Given that the current interference symbol 
is Sj, the jth element of the Q-tuple is sent through the 
channel. Therefore, one single message can result in (up to) 
Q symbols at the output. For convenience, we consider the 
output symbols corresponding to a single message as a multi- 
seo of size (exactly) Q. If the M multi-sets at the output 
corresponding to M different messages are mutually disjoint, 
reliable transmission through the channel is possible. 

Unfortunately, we cannot always find M inputs of the 
associated channel such that the corresponding multi-sets are 
mutually disjoint. For example, consider a channel with the 
input alphabet X — {0,1,2,4} and the interference alphabet 
S = {0,1,3}. It is easy to check that for this channel we 
cannot find four triples composed of elements of X such that 
the corresponding multi-sets are mutually disjoint. In fact, by 
entropy calculations we can show that the capacity of the 
channel in this example is less than 2 bits. 

However, if we put some constraint on the channel input 
alphabet, the rate logj M is achievable. 

Theorem 2: Suppose that the elements of the channel input 
alphabet X form an arithmetic progression. Then the capacity 
of the noise-free channel 



Y = X + S, 



(10) 



where the sequence of interference symbols is known causally 
at the encoder equals log2 M bits. 

Proof: Let y'^'^'^ be the set of all possible outputs of the 
noise-free channel when the interference symbol is Sq, i.e.. 



,XM + Sq}, q=l,...,Q. 

(11) 

The union of y'^'^h is the set of all possible outputs of the 
noise- free channel. 

Without loss of generality we can assume that si < 
S2 < ■ ■ ■ < SQ. The elements of 3^'^^ form an arithmetic 
progression, q — 1,...,Q. Furthermore, these Q arithmetic 
progressions are shifted versions of each other. 

'This is true even if the interference sequence is unknown to the encoder 
multi-set differs from a set in that each inember may have a multiplicity 
greater than one. For example, {1, 3, 3, 7} is a multi-set of size four where 
3 has multiplicity two. 



Xi + Si X2 + Si X2 + Si 



Xi + Sj X2 + Sj X^ + Sj 



X2 + S, X3 + S, Xm + S, 

Sq+l X2 + Xt + S,+ l X^l + 



Fig. 1. The elements of ^ , . . ^y(i+^) shown as shifted version of each 
other The elements of y(i+^") up to xi^ + Sq+i appear in y^^K 



We prove by induction on Q that there exist M mutually- 
disjoint multi-sets of size Q composed of the elements of 
yg) (Qjjg element from each). If we can 
find such M multi-sets of size Q, then we can obtain the 
corresponding M Q-tuples of elements of X by subtracting 
the corresponding interference terms from the elements of the 
multi-sets. These M Q-tuples can serve as the inputs of the 
associated channel to be used for sending any of M distinct 
messages through the channel without error in one use of the 
channel, hence achieving the rate logj M bits per channel use. 

For (5=1, the statement of the theorem is true since we can 
take {xi + si}, {x2 + Si}, . . . , {xm + si} as mutually-disjoint 
sets of size one. 

Assume that there exist M mutually-disjoint multi-sets of 
size Q ~ q. For Q = q+1, we will have the new set of channel 



}. We 



outputs 37(9+1) ^ |2;^^_5_^^^^2,2+S,+i, 

consider two possible cases: 

Case 1: None of the elements of 37(9+1) appear in any of 
the multi-sets of size Q ^ q. 

In this case, we include the elements of 3^(9+1) in the M 
multi-sets arbitrarily (one element is included in each multi- 
set). It is obvious that the resulting multi-sets of size Q = q+1 
are mutually disjoint. 

Case 2: Some of the elements of 3^(9+1) appear in some of 
the multi-sets of size Q — q. 

Suppose that the largest element of 3^(9+1) which appears 
in any of the sets 3^(i), . . ., 3^(9) (or equivalently, in any of 
the multi-sets of size Q ^ q) is Xk + Sq+i for some 1 < 
k < Af — 1. Then since 

3^(9+1) is shifted version of each 
3^(1), . . . , 3^(9) and Sg+i > Sq > ■ ■ ■ > si, exactly one of the 
sets 

3^(1),..., 3^(9)^ say 3^(J) for some 1 < j < q, contains 
all elements of 3^(9+1) up to Xk + Sq+i. See fig.[I] Since any 
of the disjoint multi-sets of size Q contain just one element 
of 37(j), the elements of 3;(9+i) up to 



Xk 



Sg+i appear m 



different multi-sets of size Q = q. We can form the disjoint 
multi-sets of size g + 1 by including these common elements 
in the corresponding multi-sets and including the elements of 
{xk+i + Sq+i, . . . ,xm + Sq+i} in the remaining multi-sets 
arbitrarily. ■ 
The condition on the channel input alphabet in the statement 
of theorem |2] is a sufficient condition for the channel capacity 
to be log2 M. However, it is not a necessary condition. For 
example, the statement of theorem |2] without that condition is 
true for the case of Q = 2. Because in the second iteration. 



we do not need the arithmetic progression condition to form 
AI mutually-disjoint muhi-sets of size two. 

The proof of theorem |2] is actually a constructive algorithm 
for finding M (out of M'^) inputs of the associated channel 
to be used with probability jj to achieve the rate logj M bits. 

It is interesting to see that the set containing the qth elements 
of the M Q-tuples obtained by the constructive algorithm is 
X, q = 1, . . . ,Q. This is due to the fact that each multi-set 
contains one element from each y'^-^\ . . . ,y'^'^K Therefore, 
a uniform distribution on the AI Q-tuples induces uniform 
distribution on Xi, . . . , Xq. 

V. Uniform Transmission 
In the sequel, we study the maximization of the rate 



Maximum achievable rate for uniform transmission 



M 



I{Xi ■ ■ ■ Xq;Y) over joint pmfs {p^^■■■^Q}f^ 
duce uniform marginal distributions on Xi, . 



.,«Q=i that in- 
,Xq, i.e.. 



(1) (2) (Q) 1 

Pi =Pi -■■■=Pi =J^' 



I = 1,2,..., Ad, (12) 



for which we show how to obtain the optimal input proba- 
bility assignment. We call a transmission scheme that induces 
uniform distribution on Xi, . . . , Xq as uniform transmission. 
The uniform distribution for Xi, . . . , Xq implies uniform 
distribution for X, the input to the state-dependent channel 
defined in 

In the previous section, we established that the capacity 
achieving pmf for the asymptotic case of noise-free channel 
induces uniform distributions on Xi , . ■ . , Xq (provided that 
we can find AI Q-tuples such that the corresponding multi- 
sets are mutually disjoint). 

Considering the constraints in (fTSl i. the maximization of 
I{Xi ■ ■ ■ Xq; Y) is reduced to the linear minimization prob- 
lem 



s. t. 



M 

41 = 1 

M 

12 = 1 
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E' 

*Q=1 



'-il---iQPil---iQ 
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E p^^- 

iQ=l 



1 

M 



M 
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Pii- 



M 

Pil---iQ 

iQ-l = l 

iQ > 0, ii, ■ 



1 



1,2, 



.,M, 



,AI. (13) 



The same argument used in the last part of the proof of 
theorem [T] can be used to show that the maximum is achieved 
by using at most AIQ — Q + 1 inputs of the associated channel 
with positive probabilities. This is restated in the following 
corollary. 

Corollary 1: The maximum of I{Xi ■ ■ ■ Xq; Y) over joint 
pmfs {Pii - iQ}i'^ iQ=i that induce uniform marginal distri- 
butions on Xi, X2, . . . , Xq is achieved by a joint pmf with 
at most AIQ — Q + 1 nonzero elements. 

This result is independent of the coefficients 
However, which probability assignment with at most AIQ — 



■ Uniform 2-PAIVl (no interference) 

-A- p„=Pj2=1'2 




SNR (dB) 



Fig. 2. Maximum mutual information vs. SNR for the chaimel with X ■ 
S = {-1,+1} and n = ra = f. 



Q+1 nonzero elements is optimal depends on the coeffi- 
cients {hi-^...i^}. The coefficient hi-^...i^ is determined by the 
interference levels si, . . . ,sq, the probability of interference 
levels ri, . . . , tq, the noise power P/v, and the signal points 
xi,X2, . . . ,xm- The optimal probability assignment is ob- 
tained by solving the linear programming problem ( fT3l ) using 
the simplex method [6]. 

A. Two-Level Interference 

If the number of interference levels is two, i.e., Q = 2, we 
can make a stronger statement than corollary [T] 

Theorem 3: The maximum of I{XiX2;Y) over 
{Piii2}ii 12=1 ^ith uniform marginal pmfs for Xi and 
X2 is achieved by using exactly AI out of Af ^ inputs of the 
associated channel with probability jj. 

Proof: The equality constraints of ST3[ can be written in 
matrix form as 

Ap = 1, (14) 

where A is a zero-one AIQ x M'^ matrix, p is AI times the 
vector containing all pi-^.-.i^s in lexicographical order, and 1 
is the all-one AIQ x 1 vector 

For Q = 2, it is easy to check that A is the vertex-edge 
incidence matrix of Km,m, the complete bipartite graph with 
AI vertices at each part. Therefore, A is a totally unimodular 
matrijJl [5]. Hence, the extreme points of the feasible region 
F = {p : Ap = l,p > 0} are integer vectors. Since the 
optimal value of a linear optimization problem is attained at 
one of the extreme points of its feasible region, the minimum 
in (T3[ is achieved at an all-integer vector p*. Considering 
that p* satisfies (fT4l) . it can only be a zero-one vector with 
exactly AI ones. ■ 

Fig. |2]depicts the maximum mutual information (for the uni- 
form transmission scenario) vs. SNR for the channel with X = 
S = { — 1,+1} and equiprobable interference symbols. The 
mutual information vs. SNR curve for the interference-free 
AWGN channel with equiprobable input alphabet { — 1,+1} 

totally unimodular matrix is a matrix for which every square submatrix 
has determinant 0,1, or —1. 



is plotted for comparison purposes. As it can be seen, for 
low SNRs, the input probability assignment pu = P22 = 
i is optimal, whereas at high SNRs, the input probability 
assignment pi2 — P21 = 5 is optimal. The maximum 
achievable rate for uniform transmission is the upper envelope 
of the two curves corresponding to different input probability 
assignments. Also, it can be observed that the achievable rate 
approaches logj 2 = 1 bit per channel use as SNR increases 
complying with the fact that we established in section HVl for 
the noise-free channel. 

It turns out from the proof of theorem [3] that the optimum 
solution of the linear optimization problem, p*, is a zero- 
one vector. So, if we add the integrality constraint to the 
set of constraints in (fT4l i. we still obtain the same optimal 
solution. The resulting integer linear optimization problem is 
called the assignment problem [5], which can be solved using 
low-complexity algorithms such as the Hungarian method [6]. 

B. Integrality Constraint for the Q-Level Interference 

The fact that for the case Q — 2, there exists an optimal p 
which is a zero-one vector with exactly M ones simplifies the 
encoding operation. Because any encoding scheme just needs 
to work on a subset of size M of the associated channel input 
alphabet with equal probabilities -p. 

For Q / 2, A is not a totally unimodular matrix. Therefore, 
not all extreme points of the feasible region defined by Ap = 
l,p > 0, are integer vectors. However, at the expense of 
possible loss in rate, we may add the integrality constraint 
in this case. The resulting optimization problem is called 
the multi- dimensional assignment problem [7]. The optimal 
solution of ( fT3] l with the integrality constraint, will be a 
vector with exactly M nonzero elements with the value jj. 
Therefore, any encoding scheme just needs to use M symbols 
of the associated channel with equal probabiUties, simplifying 
the encoding operation. 

VI. Optimal Precoding 

The general structure of a communication system for the 
channel defined in (|6]l is shown in fig. [3] Any encoding and 
decoding scheme for the associated channel can be translated 
to an encoding and decoding scheme for the original channel 
defined in (|6]l. A message w is encoded into a block of 
length n composed of input symbols of the associated channel 



r ^ 

Encoder 

V J 


T 


r ^ 




Precoder 



). There are M'^ input symbols. How- 



ever, we showed that the maximum rate with uniformity and 
integrality constraints can be achieved by using just M input 
symbols of the associated channel with equal probabilities. 
The optimal M input symbols of the associated channel are 
obtained by solving the linear programming problem ( fTST l 
with the integrality constraint. Those M input symbols of the 
associated channel define the optimal precoding operation: For 
any t that belongs to the set of M optimal input symbols, 
the precoder sends the gth component of t if the current 
interference symbol v& Sq,q= 1, . . . , Q. Based on the received 
sequence, the receiver decodes w as the transmitted message. 



Fig. 3. General structure of the communication system for channels with 
causally-known discrete interference. 



VII. Conclusion 

In this paper, we proved that the capacity of an SD-DMC 
with finite input alphabet X and finite state alphabet S and 
with causally known i.i.d. state sequence at the encoder can be 
achieved by using at most | A"! |iS| — |5| -I- 1 out of \X^^\ input 
symbols of the associated channel. As an example of state- 
dependent channels with side information at the encoder, we 
investigated M-ary signal transmission over AWGN channel 
with additive Q-level interference, where the sequence of 
interference symbols is known causally at the transmitter 

For the noise-free channel, provided that the signal points 
are equally spaced, we proposed a one-shot coding scheme that 
uses M input symbols of the associated channel to achieves 
the capacity log2 M bits. 

We considered the transmission schemes with uniform pmfs 
for Xi, . . . , Xq. For this so called uniform transmission, the 
optimal input probability assignment with at most MQ — Q + 
1 nonzero elements can be obtained by solving the linear 
optimization problem ( fT3] l. The optimal solution to ( fTlT l with 
the integrality constraint has exactly M nonzero elements. For 
the case Q = 2, we showed that the integrality constraint does 
not reduce the maximum achievable rate. The loss in rate (if 
there is any) by imposing the integrality constraint for the 
general case is a problem to be explored. 
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